ORA-00600 [kgeade_is_0] and kxfpg1sg Normally when you get ORA-00600 [string], you search on Metalink using keywords "ORA-600 string" (no quotes; "600" and "00600" are the same on Metalink). Today I have many of this error ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] on my 10.2.0.4 2-node RAC database running on Linux x86_64, filling up filesystem until I set max_dump_file_size to a very small number. I can't find relevant notes or bugs; the bugs are all related to renaming a file from ASM to filesystem (see Note:742289.1 for Bug 7207932). I can easily reproduce my error with a query involving gv$ view, as simple as select * from gv$instance. The trace file has a long call stack: ksedst()+31 ksedmp()+610 ksfdmp()+21 kgerinv()+161 kgeasnmierr()+163 kgeade()+509 kgerev()+58 kserec0()+186 kxfpg1sg()+1894 kxfpgsg()+1969 kxfrAllocSlaves()+351 kxfrialo()+2080 kxfralo()+313 kxfrAllocSlaves()+351 kxfrialo()+2080 kxfralo()+313 qerpx_rowsrc_start()+3844 qerpxStart()+234 selexe()+667 opiexe()+4671 kpoal8()+2273 opiodr()+984 kpoodrc()+38 rpiswu2()+420 kpoodr()+1020 upirtrc()+2164 kpurcsc()+125 kpuexecv8()+1710 kpuexecv8()+1710 kpuexec()+2602 OCIStmtExecute()+41 ktte_aggregate_finfo()+3062 ktte_monitor_tsth()+772 ktte_monitor_ts()+355 ksbcti()+1301 ksbabs()+804 kebm_mmon_main()+318 ksbrdp()+794 opirip()+616 opidrv()+582 sou2o()+114 opimai_real()+317 main()+116 __libc_start_main()+244 _start()+41 Ignore the number after +; it's the offset from the function address. Focus on the top part of the functions: ksedst -> ksedmp -> ksfdmp -> kgerinv -> kgeasnmierr -> kgeade -> kgerev -> kserec0 -> kxfpg1sg -> kxfpgsg KS in KSE means Kernel Service layer. E probably means Error handling. Ksedst dumps the current call stack in this trace file, and ksedmp dumps the process state. Kgeade may be interesting because the first argument in our ORA-600 error is kgeade_is_0 (asserting the kgeade() function should not return 0?). According to Bug 6954816, kgeade is "KGE ADd Error onto the error stack". This simply confirms that whatever function near the top of the stack beginning with kge is not worth looking into, even if it or its variant is the first argument in the ORA-600 error. After all, if the code already reaches the error handling routine, it's already passed the bad function that triggered this error. With that in mind, let's search for the first one (or last one in order of time) that does not begin with kge. It is kxfpg1sg in my case. It didn't take me long to find out what this function is related to. This function appears in one of my old snapshots of v$latch_misses as a value for location. The latch that uses this function is "query server process". So it clearly indicates its relationship with parallel execution process, which Oracle uses when you query a gv$ view. On Metalink, the most relevant hit may be Bug 5072023 (ORA-00600 [15735] RUNNING A QUERY ON GV$ AND DBA_* VIEWS). Unfortunately it offers no workaround. But it points out that "The error ORA-600 means that the message to join the group (for the PQ slaves) is too large". Another good hit is Note:455202.1 (ORA-00600[15735] WHEN QUERYING A TABLE WHOSE PARALLEL DEGREE IS >1). Although the triggering event is not a query on a gv$ view, the note suggests lowering parallel_execution_message_size. Indeed, earlier I manually increased the value from its meager 2152 to 16384, the maximum for pre-11g RAC (Note:6394739.8). I lowered the value back and bounced the entire database. The error is no longer generated. The moral of the story is that we should read the call stack top down, but skip ALL error handling or trace dumping functions, even though the first argument of the ORA-600, or ORA-7445 for that matter, tells you otherwise. ************* 2009-02 update: ----- Michael.B.JonesATwellsfargoDOTcom wrote: > Your note at http://yong321.freeshell.org/oranotes/ORA-600%5Bkgeade_is_0%5D.txt proved very > helpful in resolving this issue. What I found was the answer to the problem, however, was not > to change the parallel_execution_message_size parameter setting back to 2K, but to make the value > match on all nodes of out RAC database. A setting of 8K helps processing a lot and does not > generate the error. A mismatch between values on different nodes caused the error and synching > them stopped the core dumps immediately. I tested and confirmed it. Thanks! In fact, Oracle Reference has the comment "Multiple instances must have the same value" for this parameter. *************